Search CORE

13 research outputs found

Time series shapelets: a novel technique that allows accurate, interpretable and fast classification

Author: AM Martinez
E Keogh
Eamonn Keogh
F Wilcoxon
J Lin
L Breiman
Lexiang Ye
MK Jeong
MW Kadous
R Briandet
SL Salzberg
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Full-length autonomous transposable elements are preferentially targeted by expression-dependent forms of RNA-directed DNA methylation

Author: A Marí-Ordóñez
A Zemach
A-V Gendrel
AD McCue
AD McCue
AT Wierzbicki
AT Wierzbicki
B Langmead
B Zheng
C Hoede
C Llorens
D Garcia
D Pontier
D-L Yang
DM Bond
Drexel A. Neumann
ER Alvarez-Buylla
ER Havecker
F Tan
FK Teixeira
H Ito
H Li
H Stroud
H Stroud
H Stroud
IR Henderson
J Zhai
JA Law
JA Law
JI Gent
Josquin Daron
JP Jackson
JR Haag
JT Cuperus
Kaushik Panda
KF Erhard
KM Creasey
L Liu
L Wu
Lexiang Ji
MA Matzke
MA Urich
MD Schultz
MD Schultz
MJ Axtell
MJ Sigman
P Lamesch
Q Li
Q Li
R Ye
R. Keith Slotkin
RD Finn
RJ Schmitz
RK Slotkin
Robert J. Schmitz
S Li
S Nuthikattu
SE Castel
T Blevins
T Blevins
VV Cavrak
W Aufsatz
W Bao
X Zhong
Y Onodera
Z Lippman
Z Xie
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Learning from Time Series in the Resource-Limited Situations

Author: Ye Lexiang
Publication venue: eScholarship, University of California
Publication date: 01/01/2010
Field of study

Many data mining tasks are based on datasets containing sequential characteristics, such as web search queries, medical monitoring data, motion capture records, and astronomical observations. In these and many other applications, a time series is a concise yet expressive representation. A wealth of current data mining research on time series is focused on providing exact solutions in such small datasets. However, advances in storage techniques and the increasing ubiquity of distributed systems make realistic time series datasets orders of magnitude larger than the size that most of those solutions can handle due to computational resource limitations. On the other hand, proposed approximate solutions such as dimensionality reduction and sampling, suffer from two drawbacks: they do not adapt to available computational resources and they often require complicated parameter tuning to produce high quality results.In this dissertation, we discuss anytime/anyspace algorithms as a way to address these issues. Anytime/anyspace algorithms (after a small amount of setup time/space) are algorithms that always have a best-so-far answer available. The quality of these answers improves as more computational time/space is provided. We show that by framing problems as anytime/anyspace algorithms, we can extract the most benefit from the available computational resources and provide high-quality approximate solutions accordingly. We further argue that it is not always effective and efficient to rely on whole datasets. When the data is noisy, using distinguishing local features rather than global features could mitigate the effect of noise. Moreover, building a concise model based on local features makes the computational time and space much less expensive. We introduce a new time series primitive, time series shapelets, as a distinguishing feature. Informally, shapelets are time series subsequences which are in some sense maximally representative of a class. As we shall show with extensive empirical evaluations in diverse domains, classification algorithms based on the time series shapelet primitives can be interpretable, more accurate, and significantly faster than state-of-the-art classifiers

Ezid

eScholarship - University of California

Annotating Historical Archives of Images

Author: Christian Shelton
Eamonn Keogh
Lexiang Ye
Xiaoyue Wang
Publication venue
Publication date: 01/01/2008
Field of study

Recent initiatives like the Million Book Project and Google Print Library Project have already archived several million books in digital format, and within a few years a significant fraction of world’s books will be online. While the majority of the data will naturally be text, there will also be tens of millions of pages of images. Many of these images will defy automation annotation for the foreseeable future, but a considerable fraction of the images may be amiable to automatic annotation by algorithms that can link the historical image with a modern contemporary, with its attendant metatags. In order to perform this linking we must have a suitable distance measure which appropriately combines the relevant features of shape, color, texture and text. However the best combination of these features will vary from application to application and even from one manuscript to another. In this work we propose a simple technique to learn the distance measure by perturbing the training set in a principled way. We show the utility of our ideas on archives of manuscripts containing images from natural history and cultural artifacts

CiteSeerX